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ABSTRACT 

Accurate loudness measurement is imperative for intelligent 
music mixing systems, where one of the most fundamental 
tasks is to automate the fader balance. The goal of this short 
paper is to highlight state-of-the-art loudness algorithms to 
the automatic mixing community, and give insight into their 
differences when applied to multi- track audio. 


1. INTRODUCTION 

Loudness models of varying computational complexity have 
been used to automatically balance the levels of multi-track 


audio 1 143 1, yet little is known about how well they measure 


the relative loudness of individual instruments. For example, 
although [2] reported success using the EBU short-term loud- 


ness measure [4], [5 1 revealed a tendency for the metric to 


underestimate the subjective loudness of percussive material. 
This paper explores the predictions of both multiband and sin- 
gle band loudness models using different descriptors to quan- 
tify the overall loudness of a sound, and suggests directions 
for future work in the field. 

2. METHODOLOGY 


Three multiband models : GM02 [ 6 1 , CF02 [ 7 ] , and CH 1 2 [ 8 ] 


and three single band models: LARM [9], EBU [4] and V01 


[10]; were compared. Given a waveform, each algorithm out- 


puts a loudness time-function, from which a single value rep- 
resenting overall loudness must be determined. Developers 
generally suggest a statistic that quantifies central tendency, 
e.g. mean long-term loudness for the GM02. We use the term 
‘mean’ to denote some form of temporal average, and ‘p ea k’ 
to denote the maximum. The algorithms were instructed to 
equalise the loudness of 110 short segments (RMS level = 
73 dB SPL) of multi-track audio spanning a range of gen- 
res. The target loudness was taken as the average loudness 
of all segments, and an iterative procedure was used for the 
non-linear models. The GM02 and CHI 2 were configured to 


run at a lower complexity following suggestions given in [ 1 1 1. 
Resulting level balances were centred on zero. 

3. RESULTS 

Figure [I] (A) shows the distribution of RMS levels after loud- 
ness equalisation, using peak loudness in the case of the multi- 


l @CD_ 


band models. The spread of levels is markedly wide for the 
multiband procedures, indicating a greater sensitivity to the 
physical characteristics of the stimuli. The highest 5% of 
positive gains within each model were predominantly applied 
to bass instruments, demonstrating a common strategy across 
the algorithms to attenuate low frequencies when measuring 
loudness. The EBU programme loudness gives the narrow- 
est spread, with 50% of the segment levels within 1 dB of 
the input level. The EBU, followed by LARM, was therefore 
the most consistent with a simple energy measurement. In 
contrast, the GM02 (mean loudness) applied gains as high as 
31.6 dB for equal loudness (a bass drum segment). Thus, for 
projects involving a range of instruments, very different mixes 
can be expected from the algorithms. Subplot (B) shows the 
RMS errors between pairwise combinations of level balances. 
The single band models show greater agreement with one an- 
other compared to the multiband devices. The GM02 and 
CF02 yield notably different balances compared with those 
generated by the single band algorithms, especially when us- 
ing the mean loudness descriptor. 

Table [T] gives the RMSEs between the balances obtained 
using the two global loudness descriptors (mean or peak). 
The type of descriptor influenced the level balance most for 
the CF02, followed by the GM02. Our findings indicate that 
for the multiband algorithms, peak loudness is more appro- 
priate when the sound corpus involves transient instruments, 
since averaging the loudness time series tends to underesti- 
mate salient peaks, unless specific envelope detectors or tem- 
poral weightings designed to emphasise them are incorpo- 
rated as done by LARM and the V01, respectively. Interest- 
ingly, the predicted gains obtained using the EBU programme 
loudness and maximum EBU momentary loudness differ by 
only 1.6 dB, on average. 


Model 

RMSE (dB) 

CI 95 (dB) 

GM02 

5.6 

[4.7, 6.4] 

CF02 

6.9 

[5.7, 8.2] 

CH12 

2.7 

[2.2, 3.2] 

LARM 

2.2 

[1.9, 2.6] 

EBU 

1.6 

[1.4, 1.7] 

V01 

2.0 

[1.7, 2.2] 


Table 1: RMSE (and bootstrapped 95% confidence intervals) be- 
tween the loudness balances obtained using mean and peak loudness. 
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Figure 1: (A) Violin plots of the stimulus RMS levels after loudness equalisation, and (B) RMSE matrix for assessing balance similarity. The 
green horizontal line in (A) shows the input level of all segments. The subscript p denotes peak loudness. 


4. CONCLUSION 

Listening tests conducted by the authors suggest that the ref- 
erence single band algorithms (mean descriptor) are robust to 
a broad range of content, and such large gains predicted by 
some of the complex auditory models may not be realistic. 
However, single band devices do not model auditory mask- 
ing, a perceptual phenomenon that complicates many music 
production tasks. In this case, partial loudness calculation is 
more important, but further research into the generalisation 
of auditory models is needed first. In line with [5 ], future 
work should concentrate on fitting loudness models to a sub- 
jective reference dataset involving multi- track content, rather 
than programme material. Although at the present time the 
needed empirical data are unavailable, the audio segments, 
loudness predictions, level balances and details of model con- 
figurations are freely available at: 
https://code.soundsoftware.ac.uk/hg/wimpl6-ward-reiss 
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